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Masking  a  Compact  AES  S-box 


D.  Canright 

Applied  Mathematics  Dept.,  Code  MA/Ca 
Naval  Postgraduate  School 
Monterey,  CA  93943 

7  August  2007 


Abstract 

When  the  Advanced  Encryption  Standard  (AES)  is  implemented  in  hardware  or 
software,  it  may  be  vulnerable  to  “side-channel  attacks”  such  as  differential  power 
analysis.  One  countermeasure  against  such  attacks  is  adding  a  random  mask  to  the 
data;  this  randomizes  the  statistics  of  the  calculation  at  the  cost  of  computing  “mask 
corrections.”  The  single  nonlinear  step  in  each  round  of  the  AES  algorithm  is  called 
the  “S-box,”  which  involves  the  greatest  computational  cost  in  a  round  (to  find  the 
inverse  in  the  Galois  field),  as  well  as  the  greatest  cost  for  mask  corrections.  Oswald  et 
al.  [9]  showed  how  the  “tower  field”  representation  allows  maintaining  an  additive  mask 
throughout  the  Galois  inverse  calculation.  This  work  combines  that  masking  approach 
with  the  compact  S-box  of  Canright,  to  give  a  masked  S-box  that  requires  minimal 
circuitry,  and  hence  the  chip  area. 


1  Introduction 

The  Advanced  Encryption  Standard  (AES)  was  specified  in  2001  by  the  National  Institute 
of  Standards  and  Technology  [8],  to  provide  a  standard  algorithm  for  secure  encryption, 
intended  not  only  for  U.S.  government  documents,  but  also  for  electronic  commerce. 

Many  different  implementations  of  AES  have  appeared,  to  satisfy  the  varying  criteria 
of  different  applications.  Some  approaches  seek  to  maximize  throughput,  e.g.,  [6],  [15]  and 
[5];  others  minimize  power  consumption,  e.g.,  [7];  and  yet  others  minimize  circuitry,  e.g., 
[12],  [13],  [16],  and  [3].  For  the  latter  goal,  Rijmen[ll]  suggested  using  subfield  arithmetic  in 
the  crucial  step  of  computing  an  inverse  in  the  Galois  Field  of  256  elements.  This  idea  was 
further  extended  by  Satoh  et  al.  [13] ,  using  sub-subfields  (the  “tower  field”  representation 
of  Paar[10]),  along  with  other  innovative  optimizations,  which  resulted  in  the  smallest  AES 
circuit  at  that  point.  The  architecture  of  Satoh  was  refined  somewhat  by  Canright [2],  mainly 
through  carefully  chosen  normal  bases,  resulting  in  the  most  compact  S-box  to  date. 

No  attacks  have  yet  been  found  on  the  AES  algorithm  itself  that  are  more  effective 
than  exhaustive  key  search  (“brute  force”),  although  research  continues,  for  example,  on 
algebraic  attacks.  But  hardware  implementations  of  cryptograpy,  e.g.  in  smart  cards,  may  be 
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vulnerable  to  “side-channel  attacks”  such  as  differential  power  analysis,  that  use  statistical 
analysis  of  side  effects  like  power  consumption,  electromagnetic  radiation,  etc.,  to  deduce 
information  about  the  secret  key. 

One  countermeasure  against  side-channel  attacks  is  masking  the  data  during  calculation 
through  adding  or  multiplying  by  random  values.  All  the  steps  in  a  round  of  AES  are  affine, 
except  for  the  Galois  field  inversion  substep  of  the  S-box  ( SubBytes )  step.  For  the  other 
steps,  calculation  of  the  mask  correction  is  linear,  so  an  additive  mask  is  most  convenient. 
Some  have  suggested  switching  to  a  multiplicative  mask  for  the  Galois  inverse  step  (e.g.,  [1]), 
but  one  inescapable  weakness  is  that  a  zero  data  byte  is  unmasked  by  multiplication  [4], 

Applying  the  “tower  field”  representation,  inversion  in  GF( 28)  involves  several  multipli¬ 
cations  and  one  inversion  in  the  subfield  GF( 24),  which  in  turn  involves  multiplications  and 
inversion  in  GF( 22).  In  the  sub-subfield  GF( 22),  inversion  is  identical  to  squaring,  and  so 
is  linear  (over  GF{ 2)).  Oswald  et  al.  applied  this  idea  to  additive  masking  of  the  Galois 
inverse,  and  showed  how  to  compute  the  mask  correction  for  the  tower  field  approach.  Many 
of  the  correction  terms  involve  multiplication  in  subfields,  and  Oswald  et  al.  showed  how 
some  of  these  multiplications  can  be  eliminated  through  clever  re-use  of  parts  of  the  input 
mask  for  the  output. 

The  present  work  incorporates  this  masking  approach  into  the  compact  S-box  of  Canright[2]. 
Applying  the  same  optimizations  used  there  for  the  unmasked  S-box,  to  the  mask  correction 
terms  here,  results  a  compact  masked  S-box. 

2  Algebraic  description 

The  AES  algorithm  has  been  described  thoroughly  and  frequently  elsewhere  [8];  here  we 
give  the  barest  outline  before  concentrating  on  the  S-box.  It  is  a  symmetric  block  (16  bytes) 
cipher  consisting  of  several  rounds  (10,  12,  or  14,  depending  on  key  size).  Each  round  involves 
the  four  steps  called  SubBytes  (byte  substitution,  or  S-box),  ShiftRows,  MixColumns,  and 
AddRoundKey  (the  last  round  skips  MixColumns ,  and  there  is  a  Round  0  consisting  solely  of 
AddRoundKey) .  The  latter  three  steps  are  linear  with  respect  to  the  data  block,  and  provide 
“diffusion.”  SubBytes  is  the  nonlinear  step  that  provides  “confusion.” 

The  S-box,  applied  to  each  byte,  consists  of  two  substeps:  (i)  considering  the  byte  an 
element  of  the  Galois  field  GF( 28),  find  its  inverse  in  that  field;  (ii)  considering  the  resulting 
byte  a  vector  of  bits  in  GF( 22),  multiply  by  a  given  bit  matrix  and  add  a  given  constant 
vector,  i.e.,  an  affine  transformation. 

In  the  particular  Galois  field  of  AES,  a  byte  represents  a  polynomial  where  the  bits 
are  coefficients  of  corresponding  powers  of  x ,  and  multiplication  is  modulo  the  irreducible 
polynomial  q(x )  =  x8  +  x4  +  x3  +  x  + 1 .  Equivalently,  one  could  consider  a  root,  say  6 ,  of  this 
polynomial,  so  q{6)  =  0  in  this  field;  then  the  bits  of  a  byte  would  correspond  to  coefficients 
of  powers  of  9 ,  e.g.,  2  =  6,  3  =  6  +  1, 4  =  92,  etc.  Thus  the  bits  form  a  vector  with  respect  to 
what  is  called  a  polynomial  basis.  But  there  are  computational  advantages  to  considering 
a  different  (though  isomorphic)  representation  of  GF( 2s).  Instead  of  a  vector  of  dimension 
eight  over  GF{ 2),  we  consider  a  byte  as  a  vector  of  dimension  two  over  GF( 24),  where  each 
4-bit  element  is  in  turn  a  vector  of  dimension  two  over  GF( 22),  and  finally  each  2-bit  element 
is  a  vector  of  dimension  two  over  GF{2).  This  has  been  called  a  composite  field,  or  “tower 
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field”  representation [10].  In  this  way,  the  8-bit  inverse  calculation  comprises  several  4-bit 
operations,  each  consisting  of  various  simple  2-bit  calculations.  For  each  of  these  subfields, 
we  have  found  that  a  normal  basis  (consisting  of  a  conjugate  pair)  is  more  efficient  than  a 
polynomial  basis  for  the  required  inverse  calculation [2]. 

Converting  between  the  standard  AES  representation  and  the  composite  field  represen¬ 
tation  amounts  to  a  change  of  basis,  accomplished  by  multiplying  the  bit  vector  by  a  bit 
matrix.  In  converting  back,  this  bit  matrix  can  be  combined  with  that  of  the  affine  trans¬ 
formation  substep[13].  With  regard  to  an  additive  mask,  these  matrix  multiplies  are  simple 
linear  calculations  for  the  mask  correction  terms.  Below  we  detail  the  mask  corrections 
required  for  the  nonlinear  inverse  calculation. 

2.1  Inversion  without  masking 

Here  we  employ  the  following  convention:  upper-case  bold  symbols  represent  elements  of  the 
main  held  (e.g.  A  G  GF{ 28));  upper-case  italic  symbols  are  for  elements  of  the  subheld  (e.g. 
A  G  GF( 24));  lower-case  bold  is  used  for  the  sub-subheld  (e.g.  a  G  GF( 22));  and  lower-case 
italic  is  for  single  bits  (e.g.  a  G  GF( 2)). 

Without  masking,  inversion  in  GF(28)  /  GF(24)  using  a  normal  basis  [Y16,Y],  where  Y 
and  Y16  are  the  roots  of  X2  +  X  +  N  and  N  G  GF( 24)  is  the  norm  (product:  N  —  Y17),  is 
given  by: 

A  =  Ai  Y16  +  A0  Y  (given)  (1) 

B  =  A®  A  ©  N<S)(Ai  ©  A)2  (2) 

A’1  =  (A®£_1)  Y16+  (A®#-1)  Y  (result)  (3) 

(Note  that  ©  and  ©  denote  multiplication  and  addition  calculations  in  a  Galois  held,  while 
A  Y16  +  A  Y  is  just  the  algebraic  expression  for  the  vector  [A,  A]  in  the  normal  basis.) 
This  requires  inversion,  multiplication,  and  the  combined  “square-scale”  operation  in  the 
subheld  GF( 24).  Similarly,  the  inversion  in  GF(24) / GF(22)  using  a  normal  basis  [ Z4,Z ], 
where  Z  and  Z4  are  the  roots  of  X2  +  X  +  n  and  n  G  GF( 22)  is  the  norm  (n  =  Z5),  is  given 


by: 

B  =  bi  Z4  +  b0  Z  (given)  (4) 

c  =  bi©b0  ©  n©(bi©b0)2  (5) 

B~x  =  (b0®c_1)  Z4  +  ( b!©c_1)  Z  (result)  (6) 

But  in  the  sub-subheld  GF( 22),  inversion  is  the  same  as  squaring,  equivalent  to  a  bit  swap: 

c  =  q  w2  +  c0w  (given)  (7) 

c_1  =  cow2  +  ciw  (result)  (8) 

where  w  and  w2  are  the  roots  of  x2  +  x  +  1. 


2.2  Masked  Inversion 

Now  introduce  additive  masking.  By  adding  a  “random”  mask,  such  that  the  statistical 
distribution  of  masks  appears  uniform  over  the  held,  now  our  operands  appear  random  as 
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well,  uncorrelated  to  either  plaintext  or  key.  Hence  the  statistical  data  available  through  side 
channels  looks  like  noise,  regardless  of  the  chosen  sets  of  plaintexts,  and  the  key  is  protected. 
The  cost  is  the  computation  of  mask  correction  terms. 

We  use  the  insight  of  Oswald  et  al.  that  in  the  sub-subfield  GF( 22)  inversion  (squaring) 
is  additive,  so  for  data  a  and  mask  m,  then 

(a  ©  m)-1  =  (a  ©  m)2  =  a2  ©  m2  =  a”1  ©  m  1  (9) 

and  hireling  the  mask  correction  m2  is  trivial.  Hence  the  tower-held  approach  eliminates  the 
need  to  remove  the  additive  mask  (or  change  it  to  a  multiplicative  one)  before  inversion. 

In  the  larger  helds,  here  is  how  the  mask  corrections  can  be  calculated.  We  indicate  the 
masked  version  of  the  input  byte  A  with  a  tilde:  A,  and  similarly  for  the  other  masked 
quantities.  So  the  input  byte  to  the  masked  GF( 28)  inverter  is 

A  =  (A  ©  M)  =  Ai  Y16  +  A0  Y  ,  (10) 

being  the  data  byte  A  already  masked  by  the  (known)  mask  M  =  M\  Y16  +  Mo  Y.  Let 

B  =  Ai©A0  ©  A®(Ai©io)2  (11) 

©  A\®M o  ©  Ao®M"i  ©  Mi©Mq  (12) 

where  the  second  line  shows  the  additional  correction  terms  required,  and  the  result  B  is  B 
above,  masked  by  A©(Mi  ©  Mo)2.  Note  that,  since  M\  and  Mo  are  random,  then  so  is  their 
sum,  so  is  its  square  (an  isomorphism),  and  so  is  the  square  scaled  by  N  ^  0  (a  bijection), 
that  is,  the  uniform  distribution  of  masks  is  preserved. 

For  the  subfield  inversion,  say  B  =  bq  Z4  +  b0  Z,  call  the  mask  M2  =  N®(M±  ©  M0)2  = 
rn  |  Z4  +  rn0  Z,  and  let 

c  =  bi©b0  ©  n©^bi©b0j  (13) 

©  bi©m0  ©  bo^ni!  ©  mi©m0  (14) 

so  c  is  c  above,  masked  by  n®(mi  ©  mo)2,  and  again  the  uniform  distribution  of  masks  is 
preserved. 

In  the  sub-subfield,  say  c  =  q  w2  +  c0  w,  and  let 

c_1  =  c0  w2  +  ci  w  (bit  swap)  (15) 

so  c"1  is  c”1  above  masked  by  another  uniform  mask.  For  later  convenience,  give  this  mask 
a  name:  m2  =  n2®(m1  ©  m0). 

The  next  steps  involve  only  multiplications,  which  do  not  preserve  the  uniform  distri¬ 
bution  of  masking.  Hence  (as  in  Oswald  et  al.  [9])  we  need  to  introduce  another  additive 
mask.  This  mask  could  be  new,  or  could  be  re-used  bits  from  the  original  mask  M.  In  either 
case,  this  mask  must  be  added  first,  before  all  the  other  mask  correction  terms  are  added, 
to  prevent  unmasking  the  operands. 
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Say  now  we  introduce  a  new  temporary  4-bit  mask  T  =  ti  Z4  +  t0  Z,  and  let 


tq1  =  boigic”1  (16) 

©  ti  ©  b0©m2  ©  m0®c_1  ©  m0©m2  (17) 

bg1  =  bi©c_1  (18) 

©  t0  ©  bi©m2  ©  mi©c_1  ©  mi©m2  (19) 


so  that  the  result  B  1  =  bx  1  Z4  +  b0  1  Z  is  B  1  above,  masked  by  T  (but  is  not  the  inverse 
of  B). 

Similarly,  introduce  a  new  8-bit  mask  S  =  Si  Y16  +  Sq  Y  for  the  output,  and  let 


^r1 

=  Aq^B'1 

(20) 

©  Si  ©  A0©T  ©  Mq®B~1  ©  M0©T 

(21) 

A"1 

=  ii©5_1 

(22) 

©  S0  ©  Ai<g>T  ©  Mi©^"1  ©  Mi©T 

(23) 

so  that  the  result  A-1  = 

A71  Y16  +  Ap  1  Y  is  the  answer  A-1  above,  masked  by  the 

output 

mask  S: 

A"1  =  A”1  ©  S 

(24) 

2.3  Re-using  Masks 

Oswald  et  al.  showed  that  through  using  parts  of  the  input  mask  for  the  intermediate  results 
and  the  output,  then  several  operations  can  be  eliminated,  notably  multiplications.  We  will 
follow  the  same  strategy  below. 

The  first  place  where  re-using  masks  helps  is  in  the  masked  intermediate  result  c_1,  where 
for  one  subsequent  calculation  the  mask  irq  would  be  helpful  but  for  another  the  preferred 
mask  would  be  m0,  so  we  follow  Oswald  and  switch  masks.  Then  starting  at  (15)  above  we 
modify  the  calculation  as  follows: 


(25) 


c-1  =  c0w2  +  ciw  (26) 

©  (mi  ©  m2)  (27) 

tq1  =  b0©c_1  (28) 

©  mn  ©  b0 ©mi  ©  mo©^©1  ©  m0©mi  (29) 

c-1  =  c"1  ©  (mi  ©  m0)  (30) 

bp1  =  bi©c_1  (31) 

©  mip  ©  bi©m0  ©  mi©c_1  ©  mi©m0  (32) 


where  the  underlined  products  had  already  been  computed  previously  and  may  be  re-used. 
(Parens  indicate  the  order  of  evaluation  necessary  to  avoid  unmasking  operands,  but  those 
combinations  were  also  available  from  previous  computation.)  Note  also  that  the  result 
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B~1  =  b^1  ©4  +  bg  1  Z  is  still  B  1  above,  but  now  masked  by  M\  =  mn  ©4  +  m10  Z,  the 
upper  half  of  the  input  mask.  Following  the  same  approach  of  switching  masks  at  the  next 
level  gives 


ib1  =  A0®B~l  (33) 

©  Si  ©  Aq®Mx  ©  M0©-£>_1  ©  M0®M1  (34) 

B~l  =  B~l  ©  (M1  ©  M0)  (35) 

io1  =  (36) 

©  So  ©  A\®Mo  ©  Mi  ©  B  4  ©  M"i  ©Mg  (37) 


again  allowing  the  underlined  terms  to  be  re-used,  and  with  the  output  A-1  =  A]”1  Y16  + 
Aq  1  Y  being  masked  by  the  output  mask  S  (which  could  be  the  original  intpnt  mask  M,  or 
not): 

A-1  =  A^1  ©  S  (38) 


2.4  Re-using  Masks  between  rounds 

Many  of  the  mask  correction  terms  used  in  the  masked  inversion  above  involve  only  the 
input  mask,  independent  of  the  masked  data.  This  is  also  true  of  all  the  mask  correction 
term  calculations  in  the  other  steps  of  each  round  of  encryption,  as  those  other  steps  are  all 
linear  (with  respect  to  the  additive  mask).  Then,  if  the  original  128-bit  mask  for  a  block  of 
data  were  to  be  re-used  for  every  round,  all  those  data-independent  correction  terms  would 
be  the  same  for  each  round.  For  implementations  where  the  round  loop  is  “unrolled”  with 
S-boxes  for  each  round,  these  terms  would  only  need  computing  once,  then  could  be  passed 
along  to  all  the  other  rounds.  This  would  save  the  re-computation  of  all  those  mask  terms, 
eliminating  the  associated  circuitry,  at  the  modest  cost  of  the  “wiring”  required  to  pass  along 
the  correction  terms.  Of  course,  one  would  use  a  new  random  mask  with  each  new  block  of 
data  in  Round  0,  to  ensure  that  over  time  the  distribution  of  masks  remains  uniform. 

More  precisely,  one  way  to  do  this  starts  by  picking  a  random  128-bit  mask  that  will 
be  used  as  the  output  mask  (whose  bytes  correspond  to  S  above)  from  the  inversion  step. 
Then  after  each  byte  undergoes  the  basis  change  (from  the  tower  field  form)  and  affine 
transformation  part  of  the  S-box  (excluding  the  additive  constant),  the  ShiftRows  step  is 
applied  to  the  whole  mask;  the  result  is  the  output  mask  after  the  last  round  of  encryption 
(which  lacks  the  MixCols  step).  Then  MixCols  is  applied  to  that,  giving  the  input  mask  to 
be  added  to  the  initial  data  before  Round  0.  Applying  byte-wise  the  basis  change  (to  the 
tower  held  form)  gives  the  input  mask  (corresponding  to  M  above)  for  the  inversion  step. 
From  this  can  be  computed  such  terms  as  M]©Mo,  M2,  mi©mo,  and  m2  above,  to  be  re-nsed 
each  round.  Then  the  only  correction  terms  that  would  need  computing  in  each  round  are 
the  data-dependent  terms  (e.g.  Ai©M0  above)  of  the  inversion  step. 

But  this  only  makes  sense  if  the  application  has  enough  room  to  unroll  the  loop.  In  cases 
where  compactness  is  paramount  the  same  few  S-boxes  would  be  employed  for  each  round; 
using  pre-computed  correction  terms  from  round  to  round  would  then  require  extra  registers 
-  a  cost  rather  than  a  saving. 
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2.5  Security  of  Masks 

Here  we  show  that  this  masked  inversion  operation  is  secure,  by  which  we  mean,  given  input 
and  output  masks  uniformly  distributed,  then  the  distribution  of  each  masked  operand  is 
independent  of  the  distribution  of  the  data. 

First  note  that,  for  a  variable  uniformly  distributed  in  a  finite  field  F,  then  applying 
any  Injection  (one-to-one,  onto  mapping  from  the  field  to  itself)  /  :  F  — >  F  will  give  a  new 
variable  y  =  f(x)  that  is  also  uniformly  distributed.  In  particular,  any  isomorphism  is  a 
bijection. 

Also,  for  a  string  of  n  bits  [&i,  62,  •  •  • ,  bn]  uniformly  distributed  over  the  set  of  all  2" 
such  strings  of  the  same  length,  then  any  substring  [bl:  bt+i,  •  •  • ,  bf\  will  also  be  uniformly 
distributed  over  the  set  of  such  strings  of  the  same  length. 

Now  consider  the  operations  we  perform  with  the  initial,  uniformly  distributed  masks  m. 
Adding  data  a  to  give  masked  data  a  —  a  ©  m  is  a  bijection  f(m)  =  m  ©  a  =  a;  regardless  of 
what  the  data  a  is,  the  masked  data  a  retains  the  uniform  distribution  of  the  mask.  Splitting 
a  mask  into  two  halves  gives  two  independent  masks  uniformly  distributed  over  the  subfield. 
Adding  two  independent  masks  results  in  another  uniformly  distributed  mask.  Squaring  is 
an  isomorphism  in  GF{ 2n),  so  squaring  a  mask  gives  another  uniformly  distributed  mask. 
Similarly  to  addition,  multiplying  by  a  nonzero  constant  value  0  is  a  bijection;  here  the 
constant  is  the  norm  of  the  basis  elements  and  so  cannot  be  zero.  So,  for  example,  M2  above, 
the  mask  for  B,  retains  the  uniform  distribution  of  M  from  which  it  was  derived  using  the 
above  operations;  similarly  m2. 

Multiplying  two  independent  uniformly  distributed  variables  does  not  give  a  uniformly 
distributed  product.  The  latter  half  of  the  inverse  calculation  involves  only  multiplications, 
including  mask  correction  terms,  so  no  such  term  can  act  as  a  mask,  and  adding  all  such 
terms  would  unmask  the  operand.  (Note  that  each  of  these  individual  products  has  the  same 
distribution  as  the  product  of  two  random  variables,  so  is  not  related  to  the  unmasked  data.) 
Here,  first  starting  with  a  new  uniformly  distributed  mask  and  then  adding  products  to  it 
will  ensure  that  each  intermediate  result  maintains  the  uniform  distribution  of  the  mask,  as 
pointed  out  by  [9] .  Therefore,  the  distribution  for  each  intermediate  term  (either  uniform  or 
product  of  uniform)  is  independent  of  the  data,  and  the  calculation  is  secure. 

3  Implementation  Details 

The  appendix  gives  Verilog  code  for  a  masked  S-box  using  the  merged  architecture  of  Satoh 
et  al,  which  combines  the  S-box  with  the  inverse  S-box  (for  decryption),  sharing  the  Galois 
inverter.  The  tower  field  representation  here  uses  the  same  normal  bases  used  by  Canright 
in  the  unmasked  version.  Here  both  the  input  mask  and  the  output  mask  are  parameters, 
along  with  the  masked  data  byte.  The  code  has  been  tested  in  an  FPGA  implementation 
and  shown  to  give  correct  results  for  every  combination  of  encryption/decryption,  data  byte, 
input  mask,  and  output  mask  (33,554,432  combinations). 

The  same  types  of  optimizations  for  minimal  circuitry  as  used  by  Canright  for  the  un¬ 
masked  S-box  were  applied  to  the  masked  version.  Among  these  optimizations  are  the  re-use 
of  bit  sums  for  factors  in  multipliers  (using  normal  bases,  all  factors  are  shared  between  two 
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Tabic  1:  Inverter  Size.  Here  we  compare  the  masked  inverter  with  the  unmasked  version, 
where  total  gates  is  in  NAND  equivalents. 


Inverter 

gate  counts 

total  gates 

masked 

unmasked 

217  XOR,  94  NAND,  6  NOR 
56  XOR,  34  NAND,  6  NOR 

480 

138 

Table  2:  Basis  Change  Sizes.  Here  we  compare  gates  needed  in  the  basis  change  bit  matrices 
(including  the  affine  transformation  but  excluding  the  Galois  inverter)  for  a  merged  S-box  & 
inverse,  S-box  alone,  and  inverse  S-box  alone,  using  different  input  and  output  masks,  same 
mask  for  both,  or  no  mask.  Both  individual  gate  counts  and  NAND  equivalents  are  given. 


Basis  Change 

merged 

S-box 

(S-box)"1 

2  masks 

1  mask 
unmasked 

78  XOR,  4  NOT,  32  MUX  =  196 
58  XOR,  3  NOT,  24  MUX  =  146 
38  XOR,  2  NOT,  16  MUX  =  96 

49  XOR  =  86 
44  XOR  =  77 
24  XOR  =  42 

50  XOR  =  88 
45  XOR  =  79 
25  XOR  =  44 

multipliers)  and  the  use  of  NOR  gates  where  appropriate  to  replace  a  combination  of  NAND 
and  XOR  gates  (minimizing  the  size  for  the  0.13-yU  CMOS  standard  cell  library [14]  con¬ 
sidered).  The  tables  give  the  results  for  the  masked  Galois  inverter  and  the  basis  change 
(bit  matrices)  separately.  Results  are  shown  by  number  and  type  of  logic  operations,  and 
also  by  total  “gates,”  where  the  number  refers  to  the  equivalent  number  of  NAND  gates 
(rounded  to  whole  numbers),  using  our  standard  cell  library.  We  use  the  equivalencies  1 
XOR/XNOR  =  \  NAND  gates,  1  NOR  =  1  NAND  gate,  1  NOT  =  §  NAND  gate,  and  1 
MUX21I  =  \  NAND  gates  [14], 

Note  that  the  additional  resources  needed  to  use  different  masks  on  input  and  output 
are  significant  for  the  merged  architecture,  but  not  for  dedicated  encryption  (or  decryption) 
only.  There  is  little  reason  not  to  use  the  input  mask  for  the  output  as  well.  In  this  case, 
the  size  for  the  merged  architecture  where  encryption  and  decryption  share  an  inverter  is 
626  NAND  equivalents,  almost  three  times  the  size  of  the  unmasked  version  (234).  (For 
encryption  only,  not  merged,  the  S-box  with  a  single  mask  for  both  input  and  output  is  557 
NAND  equivalents,  compared  with  180  for  unmasked;  again  masked  is  three  times  larger.) 

However,  if  the  current  approach  were  used  in  an  application  where  the  loop  of  rounds 
was  “unrolled”  (requiring  enough  room  for  at  least  160  S-boxes),  the  masks  could  be  re-used 
from  round  to  round,  as  discussed  above.  This  would  require  passing  along  the  extra  bits 
of  pre-computed  corrections  between  rounds.  For  one  S-box,  the  total  number  of  mask-term 
bits  would  be  43,  as  compared  to  8  bits  for  an  input  mask  alone  (to  be  used  as  output  mask 
also,  or  16  bits  for  two  different  masks).  These  extra  wires  would  replace  33  XORs  and 
12  NANDs  in  the  inverter,  and  all  of  the  mask  basis  change  calculation,  e.g.  40  XORs,  16 
MUXs,  and  2  NOTs  for  the  merged  S-box  with  two  different  masks,  or  for  an  S-box  alone 
with  one  mask,  20  XORs  (no  MUXs  or  NOTs).  So  re-using  masks  between  rounds  would  give 
a  masked  merged  S-box  of  506  NAND  equivalents,  rather  than  the  626  above.  In  addition 
to  this  saving  per  S-box  (after  the  first  round),  the  MixCols  operation  on  the  mask  block 


would  also  be  eliminated  (again,  after  the  first  round). 

Direct  comparison  with  Oswald  et  al.  [9]  is  difficult  at  the  level  of  optimization  employed 
here.  Their  terms  of  comparison  are  operations  in  GF{ 24),  excluding  addition;  in  their 
Table  1  they  list  9  multiplications,  2  squarings,  and  2  multiplications  by  a  constant  (or 
scalings).  In  those  terms,  the  present  approach  is  almost  the  same,  except  we  require  only 
8  multiplications  instead  of  9.  But  each  of  the  two  squarings  is  followed  by  a  scaling  (same 
constant),  so  our  approach  treats  square-scale  as  a  single  operation,  which  we  have  optimized 
down  to  3  single-bit  XORs,  less  than  one  of  the  4-bit  additions  that  are  not  counted  there. 
But  while  comparison  is  difficult  in  the  lack  of  specifics,  to  the  best  of  our  knowledge  ours 
is  the  smallest  masked  S-box  to  date. 


4  Conclusion 

For  some  hardware  implementations  of  AES,  countermeasures  against  side-channel  attacks 
can  be  important.  Here  we  give  a  method  for  masking  the  S-box  (the  rest  of  a  round  being 
linear)  that  is  secure,  in  that  the  distributions  of  all  the  masked  operands  are  independent  of 
the  distribution  of  the  data.  This  masked  S-box  has  been  optimized  for  minimal  chip  area, 
giving  the  smallest  masked  S-box  of  which  we  are  aware. 

The  overhead  for  masking  nearly  triples  the  size  of  the  S-box,  from  234  gates  to  626 
gates  for  the  merged  version.  In  applications  with  sufficient  resources  to  unroll  the  round 
loop  (where  the  compactness  of  the  S-box  allows  more  copies  for  a  given  area),  some  savings 
may  result  from  re-using  the  block  mask  between  rounds.  Then  each  S-box  (after  the  first 
round)  would  require  only  506  gates,  a  little  over  twice  the  size  of  the  unmasked  version. 
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Appendix:  S-box  Algorithm  in  Verilog 

/*  S-box  &  inverse  with  MASKING,  using  all  normal  bases  */ 

/*  based  on  compact  S-box  using  Canright  algorithm  */ 

/*  optimized  using  NOR  gates  and  NAND  gates  */ 

/*  multiply  in  GF(2"2),  shared  factors,  using  normal  basis  [Omega" 2, Omega]  */ 
module  GF_MULS_2  (  A,  B,  Q  ); 

input  [2:0]  A;  /*  shared  factors  include  bit  sum:  sum  hi  lo  */ 
input  [2:0]  B ; 
output  [1:0]  Q; 
wire  abed,  p,  q; 

assign  abed  =  ~(A[2]  &  B  [2] ) ;  /*  note:  syntax  for  NAND  won’t  compile  */ 

assign  p  =  ( ~ (A [  1]  &  B[l]))  ~  abed; 

assign  q  =  ( ~ ( A [0]  &  B[0]))  ~  abed; 

assign  Q  =  {  p,  q  }; 
endmodule 

/*  multiply  &  scale  by  N  in  GF(2"2),  shared  factors,  basis  [Omega" 2, Omega]  */ 
module  GF_MULS_SCL_2  (  A,  B,  Q  ); 

input  [2:0]  A;  /*  shared  factors  include  bit  sum:  sum  hi  lo  */ 
input  [2:0]  B ; 
output  [1:0]  Q; 
wire  t,  p,  q; 

assign  t  =  ~(A[0]  &  B  [0] ) ;  /*  note:  syntax  for  NAND  won’t  compile  */ 

assign  p  =  ( ~  (A  [2]  &  B[2]))  t; 

assign  q  =  ( ~  (A  [1]  &  B[l]))  t; 

assign  Q  =  {  p,  q  }; 
endmodule 

/*  sums  for  shared  factors,  2-bit  ->  3  */ 
module  FAC_2  (a,  Q  ) ; 
input  [1:0]  a; 
output  [2:0]  Q ; 
wire  sa; 

assign  sa  =  a[l]  "  a[0]  ; 

/*  output  is  three  1-bit  shared  factors:  sum  hi  lo  */ 
assign  Q  =  {  sa,  a  }; 
endmodule 

/*  inverse  in  GF(2"4) /GF(2~2) ,  using  normal  basis  [alpha"8,  alpha"2]  */ 
module  GF_INV_4  (  A,  M,  N,  0,  Q  ); 
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input  [3:0]  A; 

input  [3:0]  M;  /*  input  mask  */ 

input  [3:0]  N;  /*  output  mask  */ 

input  [3:0]  0;  /*  outer  mask-switch  terms,  to  save  2  XORs  */ 

output  [3:0]  Q ; 

wire  [1:0]  a,  b,  m,  n,  c,  e,  d,  p,  q,  an,  mb,  mn,  dn,  em,  pm,  qm; 

wire  [2:0]  af ,  bf,  mf ,  nf,  ef,  df ;  /*  factors  w/  bit  sums  */ 

assign  a  =  A [3 : 2] ; 
assign  b  =  A [1 : 0]  ; 
assign  m  =  M  [3 : 2] ; 
assign  n  =  M [1 : 0] ; 

FAC_2  afac(a,  af); 

FAC_2  bfac(b,  bf); 

FAC_2  mfac(m,  mf) ; 
assign  nf  =  {0[l],n}; 

GF_MULS_2  anmulCaf,  nf,  an); 

GF_MULS_2  mbmul (mf ,  bf ,  mb) ; 

GF_MULS_2  mnmul (mf ,  nf ,  mn) ; 

/*  optimize  section  below  using  NOR  gates  */ 

assign  c  =  {  /*  note:  ~|  syntax  for  NOR  won’t  compile  */ 

~ (a  [1]  |  b  [1] )  ~  (~ (af  [2]  &  bf  [2] ) )  , 

~  (af  [2]  |  bf  [2] )  ~  C(a[0]  &  b[0]))  > 
an  ~  mb  ~  mn  ; 

/*  end  of  NOR  optimization  */ 

assign  e  =  {  /*  inverse  masked  by  n  (lo  input  mask)  */ 

c  [0]  n[0]  ~  mf  [2]  , 

c  [1]  m[l]  ~  nf  [2]  }; 

FAC_2  efac(e,  ef); 

GF_MULS_2  qmul(ef,  af ,  q) ; 

GF_MULS_2  emmul(ef,  mf,  em) ; 

/*  NOTE:  to  maintain  masking, 

the  output  mask  N  must  be  added  BEFORE  p,  q  are  added  to  other  terms  */ 
assign  qm  =  N[1:0]  an  ~  em  ~  mn;  /*  mask  terms  for  q  (lo  output)  */ 
assign  d  =  {  /*  switch  masks:  n  ->  m  (hi  input  mask)  */ 
c  [0]  ~  0[3]  , 
e  [0]  m[0]  n[0]  }; 

FAC_2  dfac(d,  df); 

GF_MULS_2  pmul(df,  bf ,  p) ; 

GF_MULS_2  dnmul (df ,  nf ,  dn) ; 

assign  pm  =  N[3:2]  mb  ~  dn  ~  mn;  /*  mask  terms  for  p  (hi  output)  */ 
assign  Q  =  {  (pm  ~  p) ,  (qm  q)  }; 
endmodule 

/*  multiply  in  GF(2~4) /GF(2~2) ,  shared  factors,  basis  [alpha~8,  alpha~2]  */ 
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module  GF_MULS_4  (  A,  B,  Q  ); 

input  [8:0]  A;  /*  shared  factors  include  bit  sums:  sum  hi  lo  */ 

input  [8:0]  B ; 

output  [3:0]  Q ; 

wire  [1:0]  ph,  pi,  p; 

GF_MULS_SCL_2  summulC  A  [8: 6],  B  [8  :  6]  ,  p) ; 

GF_MULS_2  himul (A  [5:3] ,  B  [5  :  3]  ,  ph) ; 

GF_MULS_2  lomul (A  [2:0] ,  B  [2 : 0] ,  pi); 
assign  Q  =  {  (ph  p) ,  (pi  ~  p)  }; 
endmodule 

/*  sums  for  shared  factors,  4-bit  ->  9  */ 
module  FAC_4  (a,  Q  ) ; 
input  [3:0]  a; 
output  [8:0]  Q ; 
wire  [1:0]  sa; 
wire  al,  ah,  aa; 

assign  sa  =  a [3: 2]  a [1:0]; 

assign  al  =  a[l]  ~  a[0]  ; 
assign  ah  =  a  [3]  ~  a  [2]  ; 
assign  aa  =  sa[l]  ~  sa[0]  ; 

/*  output  is  three  3-bit  shared  factors:  sum  hi  lo  */ 
assign  Q  =  {  aa,  sa,  ah,  a[3:2] ,  al,  a[l:0]  }; 
endmodule 

/*  inverse  in  GF(2~8) /GF(2~4) ,  using  normal  basis  [d~16,  d]  */ 
module  GF_INV_8  (  A,  M,  N,  Q  ); 
input  [7:0]  A; 

input  [7:0]  M;  /*  input  mask  */ 

input  [7:0]  N;  /*  output  mask  */ 

output  [7:0]  Q ; 

wire  [3:0]  a,  b,  m,  n,  o,  c,  d,  e,  p,  q,  m4,  an,  mb,  mn,  dn,  em,  pm, 

wire  [8:0]  af ,  bf,  mf ,  nf,  ef,  df ;  /*  factors  w/  bit  sums  */ 

wire  cl,  c2,  c3;  /*  for  temp  var  */ 

assign  a  =  A  [7 : 4] ; 
assign  b  =  A  [3 : 0] ; 
assign  m  =  M [7 : 4] ; 
assign  n  =  M  [3 : 0] ; 

assign  o  =  m  ~  n;  /*  to  switch  masks  below;  has  useful  bits  */ 

FAC_4  afac(a,  af); 

FAC_4  bfac(b,  bf); 

FAC_4  mfac(m,  mf) ; 


qm 
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FAC_4  nfac(n,  nf); 

GF_MULS_4  anmul(af,  nf,  an); 

GF_MULS_4  mbmul (mf ,  bf ,  mb) ; 

GF_MULS_4  mnmul (mf ,  nf ,  mn) ; 

/*  optimize  section  below  using  NOR  gates  */ 
assign  cl  =  ~(af[5]  &  bf  [5] ) ; 
assign  c2  =  ~(af[6]  &  bf  [6] ) ; 
assign  c3  =  ~(af[8]  &  bf  [8] ) ; 

assign  c  =  {  /*  note:  ~|  syntax  for  NOR  won’t  compile  */ 

(~ (af  [6]  |  bf  [6] )  ~  (~ (a[3]  &  b[3])))  ~  cl  "  c3  , 

(~  (af  [7]  |  bf  [7] )  ~  (~  (a [2]  &  b  [2]  ) ) )  ~  cl  "  c2  , 

(~  (af  [2]  |  bf  [2] )  ~  (~  (a[l]  &  b [1]  ) ) )  ~  c2  ~  c3  , 

(~  (a[0]  |  b  [0] )  ~  (~  (af  [2]  &  bf  [2] ) ) )  ~  (~(af[7]  &  bf[7]))  ~  c2  } 

an  ~  mb  ~  mn  ; 

/*  end  of  NOR  optimization  */ 

assign  m4  =  {  /*  this  is  input  mask  for  subfield  */ 

mf  [6]  ~  nf  [6]  , 
mf  [7]  ~  nf  [7]  , 
mf  [2]  ~  nf  [2]  , 
o [0]  }; 

GF_INV_4  dinv(  c,  m4,  m,  o,  d) ;  /*  inverse  masked  by  m  (hi  input  mask)  */ 

FAC_4  dfac(d,  df); 

GF_MULS_4  pmul(df,  bf ,  p) ; 

GF_MULS_4  dnmul (df ,  nf ,  dn) ; 

assign  pm  =  N [7 : 4]  mb  ~  dn  ~  mn;  /*  mask  terms  for  p  (hi  output)  */ 

assign  e  =  d  ~  o;  /*  switch  masks:  m  ->  n  (lo  input  mask)  */ 

FAC_4  efac(e,  ef); 

GF_MULS_4  qmul(ef,  af ,  q) ; 

GF_MULS_4  emmul(ef,  mf,  em) ; 

assign  qm  =  N[3:0]  an  ~  em  ~  mn;  /*  mask  terms  for  q  (lo  output)  */ 

assign  Q  =  {  (pm  ~  p)  ,  (qm  q)  }; 

endmodule 

/*  S-box  basis  change  with  MASKING,  using  all  normal  bases  */ 

/*  MUX21I  is  an  inverting  2:1  multiplexor  */ 
module  MUX21I  (  A,  B,  s,  Q  ); 


input 

A; 

input 

B; 

input 

s;  /*  selection  switch  */ 

output 

Q; 

assign  Q  = 

~(s?A:B);  /*  mock-up  for  FPGA  implementation  */ 

endmodule 
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/*  select  and  invert  (NOT)  byte,  using  MUX21I  */ 
module  SELECT_N0T_8  (  A,  B,  s,  Q  ); 
input  [7:0]  A; 
input  [7:0]  B ; 

input  s;  /*  selection  switch  */ 

output  [7:0]  Q ; 

MUX21I  m7 (A [7] , B [7] , s , Q  [7] ) ; 

MUX21I  m6 (A [6] , B [6] , s , Q  [6] )  ; 

MUX21I  m5  (A  [5]  ,  B  [5]  ,  s ,  Q  [5] ) ; 

MUX21I  m4  (A  [4]  ,  B  [4]  ,  s  ,  Q  [4]  )  ; 

MUX21I  m3 (A [3] , B [3] , s , Q  [3] ) ; 

MUX21I  m2 (A [2] , B [2] , s , Q  [2] )  ; 

MUX21I  ml (A [1] ,B [1] , s , Q  [1] ) ; 

MUX21I  mO (A [0] , B [0] , s , Q  [0] )  ; 
endmodule 


/*  find  either  Sbox  or  its  inverse  in  GF(2~8),  by  Canright  Algorithm  */ 
/*  with  MASKING:  the  input  mask  M  and  output  mask  N  must  be  given  */ 
module  bSbox  (  A,  M,  N,  encrypt,  Q  ); 
input  [7:0]  A; 

M; 

N; 

encrypt;  /*  1  for  Sbox,  0  for  inverse  Sbox  */ 

Q; 


input 

input 

input 

output 


[7:0] 

[7:0] 


[7:0] 


wire 

[7:0] 

B, 

C,  D, 

E, 

F, 

G,  H, 

V, 

w. 

wire 

Rl, 

R2 , 

R3 , 

R4 , 

R5 , 

R6 , 

R7 , 

R8 , 

R9 ; 

wire 

SI, 

S2, 

S3, 

S4 , 

S5 , 

S6 , 

S7, 

S8 , 

S9 ; 

wire 

Tl, 

T2 , 

T3, 

T4 , 

T5 , 

T6 , 

T7, 

T8 , 

T9; 

wire 

Ul, 

U2 , 

U3 , 

U4 , 

U5 , 

U6 , 

U7, 

U8 , 

U9 , 

/*  change  basis  from  GF(2~8)  to  GF(2~8)/GF(2~4)/GF(2~2)  */ 
/*  combine  with  bit  inverse  matrix  multiply  of  Sbox  */ 


assign 

Rl 

= 

A  [7] 

~  A  [5] 

assign 

R2 

= 

A  [7] 

X 

> 

> 
i - 1 

l _ l 

assign 

R3 

= 

A  [6] 

> 

> 
l - 1 

o 

1 _ 1 

assign 

R4 

= 

A  [5] 

~~  R3 

assign 

R5 

= 

A  [4] 

~  R4 

assign 

R6 

= 

A  [3] 

> 

> 

1 - 1 

o 

1 _ 1 

assign 

R7 

= 

A  [2] 

~  Rl 

assign 

R8 

= 

A  [1] 

CO 

PC 

< 

assign 

R9 

= 

A  [3] 

00 

PC 

< 

assign 

B  [7] 

= 

R7 

X 

> 

£0 

00 

assign 

B  [6] 

= 

R5 

J 

assign 

B  [5] 

= 

A  [1] 

~  R4 

assign 

B  [4] 

= 

Rl 

~~  R3 
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assign  B[3] 
assign  B  [2] 
assign  B  [1] 
assign  B [0] 
assign  Y [7] 
assign  Y [6] 
assign  Y [5] 
assign  Y [4] 
assign  Y[3] 
assign  Y[2] 
assign  Y[l] 
assign  Y[0] 
SELECT, 

//  convert 
assign  SI 
assign  S2 
assign  S3 
assign  S4 
assign  S5 
assign  S6 
assign  S7 
assign  S8 
assign  S9 
assign  E [7] 
assign  E  [6] 
assign  E  [5] 
assign  E [4] 
assign  E[3] 
assign  E  [2] 
assign  E  [1] 
assign  E[0] 
assign  F [7] 
assign  F[6] 
assign  F  [5] 
assign  F [4] 
assign  F[3] 
assign  F [2] 
assign  F  [1] 
assign  F[0] 
SELECT, 

assign  T1 
assign  T2 
assign  T3 


=  A  [1]  ~ 

R2  ~  R6  ; 

=  ~  A  [0] 

J 

=  R4  ; 

=  A [2]  ~~ 

R9  ; 

=  R2  ; 

=  A  [4]  ~ 

R8  ; 

=  A  [6]  ~ 

A  [4]  ; 

=  R9  ; 

=  A [6]  ~~ 

R2  ; 

II  II 

> 

l - 1  £0 

^  ->] 

1 _ 1 

>  -  • 

R6  ; 

=  A  [1]  ~ 

R5  ; 

M0T_8  sel. 

in(  B,  Y,  encrypt,  Z  ); 

masks  also,  but  no  additive  constant  for  affine 

=  M [7]  ~~ 

M  [5]  ; 

=  M [7]  ~~ 

M  [4]  ; 

=  M [6]  ~~ 

M  [0]  ; 

=  M[5]  ~ 

S3  ; 

=  M  [4]  ~ 

S4  ; 

=  M[3]  ~ 

M  [0]  ; 

=  M[2]  ~ 

SI  ; 

=  M  [1]  ~ 

S3  ; 

=  M[3]  ~ 

S8  ; 

=  S7  ~~ 

S8  ; 

=  S5  ; 
=  M  [1]  ~ 

S4  ; 

=  SI  ~~ 

S3  ; 

=  M  [1]  ~ 

S2  ~  S6  ; 

=  ~  M  [0] 

i 

=  S4  ; 

=  M[2]  ~ 

S9  ; 

=  S2  ; 
=  M  [4]  ~ 

S8  ; 

=  M  [6]  ~~ 

M  [4]  ; 

=  S9  ; 

=  M[6]  ~ 

S2  ; 

=  S7  ; 

=  M [4]  ~~ 

S6  ; 

=  M  [1]  ~ 

S5  ; 

N0T_8  sel. 

Min(  E,  F,  encrypt,  V  ); 

=  N  [7]  ~~ 

N  [5]  ; 

=  N  [7]  ~~ 

N[4]  ; 

=  N [6]  ~~ 

N  [0]  ; 

16 


assign  T4  =  N [5]  ~  T3  ; 

assign  T5  =  N [4]  ~  T4  ; 

assign  T6  =  N  [3]  ~  M  [0]  ; 

assign  T7  =  N [2]  ~  T1  ; 

assign  T8  =  N [1]  ~  T3  ; 

assign  T9  =  N [3]  ~  T8  ; 

assign  G [7]  =  T7  ~~  T8  ; 

assign  G[6]  =  T5  ; 

assign  G[5]  =  N  [1]  ~  T4  ; 

assign  G [4]  =  T1  ~~  T3  ; 

assign  G[3]  =  N  [1]  ~  T2  ~  T6  ; 

assign  G  [2]  =  N[0]  ; 

assign  G [1]  =  T4  ; 

assign  G[0]  =  N  [2]  ~  T9  ; 

assign  H [7]  =  T2  ; 

assign  H[6]  =  N  [4]  ~  T8  ; 

assign  H  [5]  =  M  [6]  ~~  M  [4]  ; 

assign  H [4]  =  T9  ; 

assign  H[3]  =  N  [6]  ~  T2  ; 

assign  H [2]  =  T7  ; 

assign  H  [1]  =  N  [4]  ~~  T6  ; 

assign  H[0]  =  N  [1]  ~  T5  ; 

SELECT_N0T_8  sel_Mout(  H,  G,  encrypt,  W  ); 

GF_INV_8  inv(  Z,  V,  W,  C  ); 

/*  change  basis  back  from  GF(2~8) /GF(2~4) /GF(2~2)  to  GF(2~8)  */ 
/*  combine  with  matrix  multiply  of  Sbox  */ 


assign 

U1 

= 

C  [7] 

~  C  [3] 

assign 

U2 

= 

C  [6] 

~  C  [4] 

assign 

U3 

= 

C  [6] 

~  C[0] 

assign 

U4 

= 

C  [5] 

—  C  [3] 

assign 

U5 

= 

C  [5] 

~~  U1 

assign 

U6 

= 

C  [5] 

~  C[l] 

assign 

U7 

= 

C  [4] 

—  U6 

assign 

U8 

= 

C  [2] 

~  U4 

assign 

U9 

= 

C  [1] 

~  U2 

assign 

U10 

= 

=  U3 

“  U5 

assign 

D  [7] 

= 

U4 

J 

assign 

D  [6] 

= 

U1 

J 

assign 

D  [5] 

= 

U3 

J 

assign 

D  [4] 

= 

U5 

J 

assign 

D  [3] 

= 

U2 

"  U5 

assign 

D  [2] 

= 

U3 

~  U8 

assign 

D  [1] 

= 

U7 

> 
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assign  D[0]  =  U9  ; 
assign  X  [7]  =  C  [4]  ~~  C  [1]  ; 

assign  X [6]  =  C [1]  ~  U10  ; 

assign  X [5]  =  C [2]  ~  U10  ; 

assign  X  [4]  =  C  [6]  ~~  C  [1]  ; 

assign  X[3]  =  U8  ~  U9  ; 
assign  X [2]  =  C[7]  ~~  U7  ; 
assign  X [1]  =  U6  ; 
assign  X[0]  =  C  [2]  ; 

SELECT_N0T_8  sel_out(  D,  X,  encrypt,  Q  ); 
endmodule 

/*  test  program:  put  Sbox  output  into  register  */ 

/*  with  MASKING:  the  input  mask  M  and  output  mask  N  must  be  given  */ 
module  Sbox_m  (  A,  M,  N,  S,  Si,  CLK  ); 
input  [7:0]  A; 
input  [7:0]  M; 
input  [7:0]  N ; 
output  [7:0]  S ; 
output  [7 : 0]  Si ; 

input  CLK  /*  synthesis  syn_noclockbuf=l  */  ; 

reg  [7:0]  S; 

reg  [7:0]  Si; 

wire  [7:0]  s; 

wire  [7:0]  si; 

bSbox  sbe(A,  M,  N,  1,  s) ; 
bSbox  sbd(A,  M,  N,  0,  si); 
always  @  (posedge  CLK)  begin 
S  <=  s; 

Si  <=  si; 
end 

endmodule 
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